Efficient Detection of Repeating Sites to Accelerate Phylogenetic Likelihood Calculations

نویسندگان

  • K. Kobert
  • A. Stamatakis
  • T. Flouri
چکیده

The phylogenetic likelihood function (PLF) is the major computational bottleneck in several applications of evolutionary biology such as phylogenetic inference, species delimitation, model selection, and divergence times estimation. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood vectors for every node of the tree. Vector entries for which all input data are identical result in redundant likelihood operations which, in turn, yield identical conditional values. Such operations can be omitted for improving run-time and, using appropriate data structures, reducing memory usage. We present a fast, novel method for identifying and omitting such redundant operations in phylogenetic likelihood calculations, and assess the performance improvement and memory savings attained by our method. Using empirical and simulated data sets, we show that a prototype implementation of our method yields up to 12-fold speedups and uses up to 78% less memory than one of the fastest and most highly tuned implementations of the PLF currently available. Our method is generic and can seamlessly be integrated into any phylogenetic likelihood implementation. [Algorithms; maximum likelihood; phylogenetic likelihood function; phylogenetics].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The influence of phylogenetic uncertainty on the detection of positive Darwinian selection.

The power of maximum likelihood tests of positive selection on protein-coding genes depends heavily on detecting and accounting for potential biases in the studied data set. Although the influence of transition:transversion and codon biases have been investigated in detail, little is known about how inaccuracy in the phylogeny used during the calculations affects the performance of these tests....

متن کامل

Genetic and Phylogenetic Analysis of Adani Goat Population Based on Cytochrome B Gene

Identification of genetic characteristics is an important factor for preservation of species life. The aim of this study was to identify the genetic characteristics of the Adani goat populations based on the cytochrome b (Cyt b) gene and to detection its phylogenetic relationships with the domestic and wild goat species using NCBI database. Blood samples were taken from 12 Adani goat and subseq...

متن کامل

Faster likelihood calculations on trees

Calculating the likelihood of observed DNA sequence data at the leaves of a tree is the computational bottleneck for phylogenetic analysis by Bayesian methods or by the method of maximum likelihood. Because analysis of even moderately sized data sets can require hours of computational time on fast desktop computers, algorithmic changes that substantially increase the speed of the basic likeliho...

متن کامل

Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures.

The continuous accumulation of sequence data, for example, due to novel wet-laboratory techniques such as pyrosequencing, coupled with the increasing popularity of multi-gene phylogenies and emerging multi-core processor architectures that face problems of cache congestion, poses new challenges with respect to the efficient computation of the phylogenetic maximum-likelihood (ML) function. Here,...

متن کامل

BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics

Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 66  شماره 

صفحات  -

تاریخ انتشار 2017